11 research outputs found

    CRoute: a fast high-quality timing-driven connection-based FPGA router

    Get PDF
    FPGA routing is an important part of physical design as the programmable interconnection network requires the majority of the total silicon area and the connections largely contribute to delay and power. It should also occur with minimum runtime to enable efficient design exploration. In this work we elaborate on the concept of the connection-based routing principle. The algorithm is improved and a timing-driven version is introduced. The router, called CROUTE, is implemented in an easy to adapt FPGA CAD framework written in Java, which is publicly available on GitHub. Quality and runtime are compared to the state-of-the-art router in VPR 7.0.7. Benchmarking is done with the TITAN23 design suite, which consists of large heterogeneous designs targeted to a detailed representation of the Stratix IV FPGA. CROUTE gains in both the total wirelength and maximum clock frequency while reducing the routing runtime. The total wire-length reduces by 11% and the maximum clock frequency increases by 6%. These high-quality results are obtained in 3.4x less routing runtime

    Accelerating FPGA routing through algorithmic enhancements and connection-aware parallelization

    No full text
    Routing is a crucial step in Field Programmable Gate Array (FPGA) physical design, as it determines the routes of signals in the circuit, which impacts the design implementation quality significantly. It can be very time-consuming to successfully route all the signals of large circuits that utilize many FPGA resources. Attempts have been made to shorten the routing runtime for efficient design exploration while expecting high-quality implementations. In this work, we elaborate on the connection-based routing strategy and algorithmic enhancements to improve the serial FPGA routing. We also explore a recursive partitioning-based parallelization technique to further accelerate the routing process. To exploit more parallelism by a finer granularity in both spatial partitioning and routing, a connection-aware routing bounding box model is proposed for the source-sink connections of the nets. It is built upon the location information of each connection’s source, sink, and the geometric center of the net that the connection belongs to, different from the existing net-based routing bounding box that covers all the pins of the entire net. We present that the proposed connection-aware routing bounding box is more beneficial for parallel routing than the existing net-based routing bounding box. The quality and runtime of the serial and multi-threaded routers are compared to the router in VPR 7.0.7. The large heterogeneous Titan23 designs that are targeted to a detailed representation of the Stratix IV FPGA are used for benchmarking. With eight threads, the parallel router using the connection-aware routing bounding box model reaches a speedup of 6.1× over the serial router in VPR 7.0.7, which is 1.24× faster than the one using the existing net-based routing bounding box model, while reducing the total wire-length by 10% and the critical path delay by 7%

    A new adaptation of particle swarm optimization applied to modern FPGA placement

    No full text
    This work presents a new adaptation of the discrete particle swarm optimization method applied to the FPGA placement problem, a crucial and time-consuming step in the FPGA synthesis flow. We evaluate the performance of the new optimizer against the existing version by embedding them into a publicly available FPGA placer Liquid to replace the simulated annealing-based optimizer used for the hard block optimization. The benchmark testing using Titan23 circuits shows the runtime efficiency of the new optimizer with comparable post-routed results as those of Liquid using simulated annealing

    Hierarchical techniques for improved FPGA design tools

    No full text

    MODA-PSO : towards fast hard block legalization for analytical FPGA placement

    No full text
    Placement is a crucial step in the FPGA design tool flow, as it determines the overall performance of the circuits. Unfortunately, it is a time-consuming task. Analytical placers have been shown to be the most time-efficient while retaining good quality. One way of implementing analytical placement is to use an iterative technique that consists of optimization and look-ahead legalization, followed by an optional refinement step. In this work, with the aim towards fast hard block legalization for further accelerating analytical placement, a novel optimizer is proposed based on the modified discrete adaptive particle swarm optimization. The proposed optimizer is embedded into the publicly available analytical placer Liquid. When compared to its version using simulated annealing for hard block legalization, this approach results in a 30% reduction in hard block legalization time and a consequent 5% runtime reduction for the analytical placement, at the cost of only a 1% increase in post-routed wirelength and critical path delay. The results indicate that the nature-inspired particle swarm optimization is promising for tackling such a problem with new learning strategies and adaptation

    Hierarchical force-based block spreading for analytical FPGA placement

    No full text

    Liquid : high quality scalable placement for large heterogeneous FPGAs

    No full text
    Generating a configuration for an FPGA is a time consuming task. Most time is required for placement and routing. Placing one of the large Titan23 designs can take more than an hour with the placer in VPR. This is too long to allow efficient turnaround times. New placement techniques are proposed to speed up the process. LIQUID is a new fast placement prototyping technique that is based on analytical placement but without exactly solving the linear system of equations. Instead, the blocks are moved in several small steps in the direction that reduces the placement cost the most. In this work we introduce improvements to the LIQUID placement technique so that it can be used as a high quality placer. The main contribution is a new legalization method for the hard blocks in heterogeneous designs. We achieve a gain of 14% in wire-length cost and 15% smaller critical path delays when compared to the original version of LIQUID. Most analytical placement techniques produce a placement in two steps. First a global placement prototype is generated. The prototype is then further optimized in a refinement step. With the improved version of LIQUID, the same quality of results is obtained when compared to the placer in VPR without the need for a refinement step, leading to 23.7x faster runtimes

    Runtime-quality tradeoff in partitioning based multithreaded packing

    No full text
    It takes a long time to generate a configuration for an FPGA starting from a description of a digital circuit in a hardware design language. This configuration should have a high quality so that the FPGA resources are used in an efficient way with the maximum clock frequency and minimizing the power consumption. In this work we present two new packing algorithms that obtain better quality and faster runtimes when compared to the frequently used AAPack packer. The partitioning based methodology allows us to exploit the advantage of multithreading on commodity hardware. Firstly we demonstrate the benefits of our fully partitioning based PartSA packer. Existing packers with a partitioning based approach have problems with the cluster size and bandwidth constraint of the functional blocks. We added a fast simulated annealing step after partitioning to solve these problems. A gain of 26% in total wirelength is obtained while reaching up to 2.3x faster packing runtimes for large circuits on a CPU with four cores. Unfortunately the PartSA packer can not be used for architectures without a complete crossbar in the functional blocks. Therefore a second packer is proposed that combines the benefits of partitioning based and seed based packing. MultiPart has up to 4x faster packing runtimes while still having a gain of 20% in total wirelength
    corecore